pacman::p_load(sf,sfdep,tidyverse,tmap)Take-Home_Ex1
Analysis of Geospatial Distribution of Bus Ridership by Origin Bus Stop in Singapore
Objectives
The bus system is one of Singapore’s two pillars of public transport aside from the MRT. The bus system ensures convenient and affordable short-, medium-, and long-distance travel for riders. Thanks to the widespread availability of bus stops as compared to MRT stations, it has a high level of accessibility. However, this also leaves the system prone to under- or over-investment in terms of the number of bus routes, leading some stops and routes to be under-served or over-served.
The objective of this study is to examine the distribution of bus trips in Singapore by analyzing the number of trips by originating bus stops. It will consist of two levels of analysis:
- GeoVisualisation and Analysis: Visualizing the number of trips by originating bus stops and provide descriptive statistics of the distribution of trips by bus stops.
- Local Indicators of Spatial Association Analysis (LISA): This analysis involves the calculation of Local Moran’s I to determine local spatial autocorrelation between a bus stop and its neighbors. Additionally, visualizations such as a LISA cluster map will be created for easier comparison.
Getting Started
First, the necessary R packages will be loaded using the p_load() function of the pacman package. p_load() will also install any package which is not already installed. The following packages will be loaded:
sf: For handling of geospatial data.
sfdep: For determining the spatial dependence of spatial features. The three main categories of functionality relates to the determination of geometry neighbors, weights, and LISA.
tidyverse: For manipulation of non-spatial data. This package contains ggplot2 for plotting, dplyr and tidyr for dataframe manipulation, and readr for reading comma-separated values (CSV).
tmap: For thematic mapping, especially the mapping of simple features data frame.
Importing Required Data
For the purpose of this study, two types of data will be used: geospatial data which consists of spatial features and their coordinates information, and aspatial data which consists of attributes which can be ascribed to the geospatial data. Specifically, the following datasets will be used for each type:
- Geospatial Data:
- BusStop.shp: This shape file contains the location of the bus stops in Singapore as at July 2023. This file can be retrieved from the Land Transport Authority (LTA) Data Mall (link).
- Aspatial Data:
- origin_destination_bus_202309.csv: This CSV file contains the detail of bus trips from an originating bus stop to a destination bus stop, identified by their unique codes, each hour of the day during September 2023. The data is further broken down into weekend or weekday, but not by the specific day of the week. This data can be retrieved by using the LTA Data Mall’s API (link).
The first steps taken will be to import these files into the R environment in a manipulable format.
Importing Geospatial Data
Geospatial data can be imported using the st_read() function of the sf package. This will import the file into the R environment as a sf (simple features) data frame. st_transform() is added to transform the Coordinate Reference System (CRS) to EPSG: 3414, which is the CRS of Singapore.
In st_read():
dsn: the directory where the shape file is stored
layer: the name of the shape file
busstop <- st_read(dsn = 'data/geospatial',
layer = 'BusStop') %>%
st_transform(crs = 3414)Reading layer `BusStop' from data source
`D:\phlong2023\ISSS624\Take-Home_Ex\Take-Home_Ex1\data\geospatial'
using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension: XY
Bounding box: xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
From the message provided by R, it can be seen that the busstop sf data frame has 5161 rows, 3 columns, and has a CRS of SVY 21.
To get a better grasp of the busstop data frame, glimpse() function can be used.
The data type for each column can be seen as well as some of their values. For sf data frames, there is a geometry column (POINT type) which contains the location information for each polygon.
glimpse(busstop)Rows: 5,161
Columns: 4
$ BUS_STOP_N <chr> "22069", "32071", "44331", "96081", "11561", "66191", "2338…
$ BUS_ROOF_N <chr> "B06", "B23", "B01", "B05", "B05", "B03", "B02A", "B02", "B…
$ LOC_DESC <chr> "OPP CEVA LOGISTICS", "AFT TRACK 13", "BLK 239", "GRACE IND…
$ geometry <POINT [m]> POINT (13576.31 32883.65), POINT (13228.59 44206.38),…
Additionally, busstop can be visualized in order to spot any anomaly. This can be done using the qtm() function in the tmap package for quick plotting.
qtm(busstop)
The visualization shows us that there are four bus stops in Malaysia. Let’s remove them so that only bus stops in Singapore will be considered. This is because these special bus stops might exhibit different behaviors due to their different context from the rest of the bus stops in Singapore.
filter() can be used in conjunction with a dplyr step to remove these bus stops.
busstop <- busstop %>%
filter(!BUS_STOP_N %in% c('46609','47701', '46211', '46219', '46239'))qtm() can be used again to check that the bus stops have been removed.
Importing Aspatial Data
The read_csv() function of readr can be used to import the origin_destination_bus_202309 CSV file into the R environment as a data frame.
passenger <- read_csv('data/aspatial/origin_destination_bus_202309.csv')From the message provided by R, it can be seen that the passenger has 5,714,196 rows and 7 columns.
head() can be used instead of glimpse() to view the top five rows of the passenger data frame. This will also allow us to see the data type of each of the column.
head(passenger)# A tibble: 6 × 7
YEAR_MONTH DAY_TYPE TIME_PER_HOUR PT_TYPE ORIGIN_PT_CODE DESTINATION_PT_CODE
<chr> <chr> <dbl> <chr> <chr> <chr>
1 2023-09 WEEKENDS/… 17 BUS 24499 22221
2 2023-09 WEEKENDS/… 10 BUS 65239 65159
3 2023-09 WEEKDAY 10 BUS 65239 65159
4 2023-09 WEEKDAY 7 BUS 23519 23311
5 2023-09 WEEKENDS/… 7 BUS 23519 23311
6 2023-09 WEEKENDS/… 11 BUS 52509 42041
# ℹ 1 more variable: TOTAL_TRIPS <dbl>
Note that the ORIGIN_PT_CODE and DESTINATION_PT_CODE are in the character (“chr”) data type. However, we would like it to be in the factor (“fctr”) data type for easier categorization and sorting. This can be done by using the as.factor() function.
passenger$ORIGIN_PT_CODE <- as.factor(passenger$ORIGIN_PT_CODE)
passenger$DESTINATION_PT_CODE <- as.factor(passenger$DESTINATION_PT_CODE)We can use head() to check the data type of the passenger data frame.
head(passenger)# A tibble: 6 × 7
YEAR_MONTH DAY_TYPE TIME_PER_HOUR PT_TYPE ORIGIN_PT_CODE DESTINATION_PT_CODE
<chr> <chr> <dbl> <chr> <fct> <fct>
1 2023-09 WEEKENDS/… 17 BUS 24499 22221
2 2023-09 WEEKENDS/… 10 BUS 65239 65159
3 2023-09 WEEKDAY 10 BUS 65239 65159
4 2023-09 WEEKDAY 7 BUS 23519 23311
5 2023-09 WEEKENDS/… 7 BUS 23519 23311
6 2023-09 WEEKENDS/… 11 BUS 52509 42041
# ℹ 1 more variable: TOTAL_TRIPS <dbl>
Data Preparation
In order to perform our analysis, certain manipulations must be made in order to prepare the data. Specifically, the passenger data set will be filtered and summarzied. Subsequently, it will be combined with the busstop data set based on the bus stop code variable present in both data frames.
Wrangling Aspatial Data
Filtering the passenger Data Set for Desired Time Frames
For the purpose of this study, the passenger data set needs to be filtered to only contain trips falling within one of the following time frames:
| Peak hour period | Bus tap on time |
|---|---|
| Weekday morning peak | 6am to 9am |
| Weekday afternoon peak | 5pm to 8pm |
| Weekend/holiday morning peak | 11am to 2pm |
| Weekend/holiday evening peak | 4pm to 7pm |
This can be accomplished using the filter() function and the dplyr steps. We can create four separate data frames to store the four different time frames
# Weekday morning peak 6am - 9am
passenger_wd_69 <- passenger %>%
filter(DAY_TYPE == 'WEEKDAY') %>%
filter(TIME_PER_HOUR >= 6 & TIME_PER_HOUR <= 9)
# Weekday afternoon peak 5pm - 8pm (17:00 - 20:00)
passenger_wd_1720 <- passenger %>%
filter(DAY_TYPE == 'WEEKDAY') %>%
filter(TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 20)
# Weekend/holiday morning peak 11am - 2pm (11:00 - 14:00)
passenger_weh_1114 <- passenger %>%
filter(DAY_TYPE == 'WEEKENDS/HOLIDAY') %>%
filter(TIME_PER_HOUR >= 11 & TIME_PER_HOUR <= 14)
# Weekend/holiday evening peak 4pm - 7pm (16:00 - 19:00)
passenger_weh_1619 <- passenger %>%
filter(DAY_TYPE == 'WEEKENDS/HOLIDAY') %>%
filter(TIME_PER_HOUR >= 16 & TIME_PER_HOUR <= 19)After the different trips have been categorized into their separate data frames, the total number trips for each origin bus stop can be tallied into a single statistic for the study period. This can be accomplished using the summarize() function. The example below shows this operation using passenger_wd_69.
The group_by() function is used to instruct R to conduct operations based on the groups created by group_by(). In this case, the summary operations will be done based on the origin bus stop codes.
# Tallying the trips by origin bus stop for Weekday morning peak 6am - 9am
passenger_wd_69_tallied <- passenger_wd_69 %>%
group_by(ORIGIN_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))
passenger_wd_69_tallied# A tibble: 5,020 × 2
ORIGIN_PT_CODE TRIPS
<fct> <dbl>
1 01012 1640
2 01013 764
3 01019 1322
4 01029 2373
5 01039 2562
6 01059 1582
7 01109 144
8 01112 7993
9 01113 6734
10 01119 3736
# ℹ 5,010 more rows
As can be seen, the newly created data frame consists only of the total trip numbers for each origin bus stop. This can be repeated for the other time frames.
# Tallying the trips by origin bus stop for Weekday afternoon peak 5pm - 8pm (17:00 - 20:00)
passenger_wd_1720_tallied <- passenger_wd_1720 %>%
group_by(ORIGIN_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))
# Tallying the trips by origin bus stop for Weekend/holiday morning peak 11am - 2pm (11:00 - 14:00)
passenger_weh_1114_tallied <- passenger_weh_1114 %>%
group_by(ORIGIN_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))
# Tallying the trips by origin bus stop for Weekend/holiday evening peak 4pm - 7pm (16:00 - 19:00)
passenger_weh_1619_tallied <- passenger_weh_1619 %>%
group_by(ORIGIN_PT_CODE) %>%
summarise(TRIPS = sum(TOTAL_TRIPS))Wrangling Geospatial Data
In order to adequately visualize the busstop sf data frame, we need to define a mapping layer. An example of a mapping layer would be to use the Master Plan 2019 Planning Sub-zone created by the Urban Redevelopment Authority (URA). However, for the purpose of this study, a hexagon layer will be used to ensure standardization of the size of each polygon and the evenly spaced gaps between a polygon and its neighbors.
The steps in this section will detail the creation of the hexagon layer using the busstop data frame and visualize the layer on a map of Singapore.
Creating a Hexagon Layer in R
The steps taken in this section is based on the guide provided by Kenneth Wong of Urban Data Palette (link).
Firstly, a hexagon or honeycomb grid can be created based on the busstop data frame using the st_make_grid() function.
There are some notable arguments in the st_make_grid() function:
cellsize = c(100,100): This argument indicates the size of each hexagon. If the cell size is large, each hexagon can encompasses multiple bus stops, whereas if a smaller cell size can help us differentiate between individual bus stop. However, a smaller cell size with many hexagons will take more time to create.
what = ‘polygons’: We would like to create polygons on a grid.
square = FALSE: The default argument is TRUE, which would create a square grid. FALSE is specified in order to create a hexagon grid.
area_honeycomb_grid = st_make_grid(busstop, cellsize = c(500,500), what = 'polygons', square = FALSE)
area_honeycomb_gridGeometry set for 5040 features
Geometry type: POLYGON
Dimension: XY
Bounding box: xmin: 3470.122 ymin: 26193.43 xmax: 48720.12 ymax: 50586.47
Projected CRS: SVY21 / Singapore TM
First 5 geometries:
The area_honeycomb_grid contains 136906 features of the same Projected CRS as the busstop data frame. If the plot() function is used, the hexagon grid will be displayed. However, this grid contains no information and might be too small to discern the individual cell.
#qtm(area_honeycomb_grid)The area_honey_comb needs to be converted to a sf data frame for further manipulation using st_sf(). Additionally, we can assign a unique id to each of the hexagon cell in area_honey_comb using mutate().
honeycomb_grid_sf = st_sf(area_honeycomb_grid) %>%
# add grid ID
mutate(grid_id = 1:length(lengths(area_honeycomb_grid)))Following this, we can use lengths() and st_intersect() to determine the allocation of bus stop in each cell. The goal is to create a new column, consisting of the number of bus stop in each of the cell. The filter() function can then be added to remove all cells with no bus stop and create the final sf data frame.
# Counting the number of bus stop in each cell
honeycomb_grid_sf$n_busstop = lengths(st_intersects(honeycomb_grid_sf,busstop))
# Removing all cells without bus stop
honeycomb_count = filter(honeycomb_grid_sf, n_busstop > 0)At this point, the hexagon grid of bus stop can be drawn onto a map of Singapore using the functions of the tmap package. Additionally, the n_busstop column can be passed to the tm_fill() function to shade the cell based on the number of bus stops in it.
Several functions are added to make the map interactive and aesthetically pleasing
tmap_mode(‘view’): Creates an interactive map which allow zooming and interacting with cells on the map
pop.vars: Identifying the legend and value which pops up when a cell is selected. In this case, it is the number of bus stops.
popup.format: Specifying the format of the variable to be displayed when selecting a cell.
tm_basemap: Choosing the basemap layer on which the hexagon grid will be drawn. OpenStreetMap is chosen due to its high fidelity while not being overly crowded. Additionally, OpenStreetMap displays icon for bus stops in Singapore, allowing user to visually check any cell.
- If an incorrect CRS was specified in the earlier steps, the basemap will be of an incorrect location or alignment.
tmap_mode('view')
bushexmap <- tm_shape(honeycomb_count)+
tm_fill(
col = "n_busstop",
palette = "Blues",
style = "cont",
title = "Number of bus stop",
id = "grid_id",
showNA = FALSE,
alpha = 0.7,
popup.vars = c(
"Number of bus stop: " = "n_busstop"
),
popup.format = list(
n_busstop = list(format = "f", digits = 0)
)
) +
tm_borders(col = "grey40", lwd = 0.7)+
tm_basemap(server = c('Esri.WorldGrayCanvas'))
bushexmapSearch for the bus stop near your place and compare the numbers between the four maps by zooming in! Do you think it’s accurate?
By looking at the illustration, we can see that each cell might contain up to ten bus stops. A bar chat can be drawn with the ggplot2 package to visualize the distribution of number of bus stop in each cell.
ggplot(honeycomb_count, aes(x=n_busstop))+
geom_bar()+
theme_classic()+
geom_text(aes(label = ..count..), stat = "count", vjust = -0.5, colour = "black")
As can be seen, the majority of cells contain 1-2 bus stop with only 214 cells containing more than 6 bus stops. This shows that the cells adequately capture solitary bus stop, as well as pairs of bus stops (bus stops which are opposite each other, served by the same bus services).
Combining Aspatial and Geospatial Data
In order to conduct geospatial analysis, a data frame which contains the hexagon cells as well as the number of bus trips for each cells must be created. This can be done using the left_join argument.
There are important arguments which can be used to create a cleaner combined data frame.
by = join_by(BUS_STOP_N == ORIGIN_PT_CODE)): Indicate the column by which the two data frames can be matched and joined. In this case, the bus stop code will be used.
select(1,4,5): Indicate the index number of the columns to be kept in the final data frame. Only the bus stop number (column 1), total number of trips (column 4), and geometry (column 5) will be kept.
replace(is.na(.),0): Replace all value of NA with 0. This is to ensure that bus stop with no trips in a given time frame is accurately tallied at 0.
# Weekday morning peak 6am - 9am
passenger_wd_69_combined <- left_join(busstop, passenger_wd_69_tallied, by = join_by(BUS_STOP_N == ORIGIN_PT_CODE))%>%
select(1,4,5)%>%
replace(is.na(.),0)
# Weekday afternoon peak 5pm - 8pm (17:00 - 20:00)
passenger_wd_1720_combined <- left_join(busstop, passenger_wd_1720_tallied, by = join_by(BUS_STOP_N == ORIGIN_PT_CODE))%>%
select(1,4,5)%>%
replace(is.na(.),0)
# Weekend/holiday morning peak 11am - 2pm (11:00 - 14:00)
passenger_weh_1114_combined <- left_join(busstop, passenger_weh_1114_tallied, by = join_by(BUS_STOP_N == ORIGIN_PT_CODE))%>%
select(1,4,5)%>%
replace(is.na(.),0)
# Weekend/holiday evening peak 4pm - 7pm (16:00 - 19:00)
passenger_weh_1619_combined <- left_join(busstop, passenger_weh_1619_tallied, by = join_by(BUS_STOP_N == ORIGIN_PT_CODE))%>%
select(1,4,5)%>%
replace(is.na(.),0)It is important to note that the bus stops and their total trips have not been tallied into the hexagon cells. st_join() can be used to accomplish this for each time frame.
The by argument in st_join() can be passed the function st_within to specify that we would like to join the two data frames where the geometry in the latter is within the geometry of the former. In this case, it would mean that two rows will be joined where the bus stop lies within a particular polygon.
- The group_by() and summarise() functions here are used similarly to before, they sums up the total number of trips for all the bus stops in the hexagon, based on its grid_id, and create a new column called TOTAL_TRIP.
# Weekday morning peak 6am - 9am
hex_passenger_wd_69 <- st_join(honeycomb_count, passenger_wd_69_combined, by = st_within(sparse = FALSE), largest = TRUE) %>%
group_by(grid_id)%>%
summarise(TOTAL_TRIP = sum(TRIPS))
# Weekday afternoon peak 5pm - 8pm (17:00 - 20:00)
hex_passenger_wd_1720 <- st_join(honeycomb_count, passenger_wd_1720_combined, by = st_within(sparse = FALSE), largest = TRUE) %>%
group_by(grid_id)%>%
summarise(TOTAL_TRIP = sum(TRIPS))
# Weekend/holiday morning peak 11am - 2pm (11:00 - 14:00)
hex_passenger_weh_1114 <- st_join(honeycomb_count, passenger_weh_1114_combined, by = st_within(sparse = FALSE), largest = TRUE) %>%
group_by(grid_id)%>%
summarise(TOTAL_TRIP = sum(TRIPS))
# Weekend/holiday evening peak 4pm - 7pm (16:00 - 19:00)
hex_passenger_weh_1619 <- st_join(honeycomb_count, passenger_weh_1619_combined, by = st_within(sparse = FALSE), largest = TRUE) %>%
group_by(grid_id)%>%
summarise(TOTAL_TRIP = sum(TRIPS))In sum, the analysis will revolve around the four following data frames which contain the spatial information of the hexagon as well as the total number of trips for each interested time frame:
- hex_passenger_wd_69: Weekday morning peak 6am - 9am
- hex_passenger_wd_1720: Weekday afternoon peak 5pm - 8pm (17:00 - 20:00)
- hex_passenger_weh_1114: Weekend/holiday morning peak 11am - 2pm (11:00 - 14:00)
- hex_passenger_weh_1619: Weekend/holiday evening peak 4pm - 7pm (16:00 - 19:00)
Cleanup Step
Before we move on to Geovisualization and the analysis of LISA, it would be wise to remove objects which will no longer be used. This will help to free up memory for other tasks. rm() can be used to perform this task
rm(list = c('area_honeycomb_grid', 'bushexmap', 'honeycomb_count', 'honeycomb_grid_sf', 'passenger_wd_1720', 'passenger_wd_1720_tallied', 'passenger_wd_1720_combined', 'passenger_wd_69', 'passenger_wd_69_tallied', 'passenger_wd_69_combined', 'passenger_weh_1114', 'passenger_weh_1114_tallied', 'passenger_weh_1114_combined', 'passenger_weh_1619', 'passenger_weh_1619_tallied', 'passenger_weh_1619_combined'))Geovisualization and Analysis
The beginning step of the analysis would be to visualize the distribution of bus trips on the hexagon layer. This can be accomplished with the mapping functions of the tmap package. However, unlike the geovisualization of bus stop per hexagon, the number of trips for each hexagon depending on the time frame varies widely. Let’s confirm this by drawing a histogram of the distribution of trips in each time frame.
weekday_morning_hist <- ggplot(hex_passenger_wd_69, aes(x=TOTAL_TRIP))+
geom_histogram()+
scale_x_continuous(labels = scales::comma)+
ggtitle('Weekday Morning')+
theme_classic()
weekday_evening_hist <- ggplot(hex_passenger_wd_1720, aes(x=TOTAL_TRIP))+
geom_histogram()+
scale_x_continuous(labels = scales::comma)+
ggtitle('Weekday Evening')+
theme_classic()
weekend_noon_hist <- ggplot(hex_passenger_weh_1114, aes(x=TOTAL_TRIP))+
geom_histogram()+
scale_x_continuous(labels = scales::comma)+
ggtitle('Weekend Noon')+
theme_classic()
weekend_evening_hist <- ggplot(hex_passenger_weh_1619, aes(x=TOTAL_TRIP))+
geom_histogram()+
scale_x_continuous(labels = scales::comma)+
ggtitle('Weekend Evening')+
theme_classic()
gridExtra::grid.arrange(weekday_morning_hist, weekday_evening_hist, weekend_noon_hist, weekend_evening_hist,
nrow = 2, ncol = 2)
Upon a brief inspection, it is possible to see that the range of trips between the different time periods are very different from each other, with Weekday Evening trips going above 300,000 for some hexagons, while Weekend Noon only ranging around 80,000 for its hexagons. By plotting this on map, we will get to see the geospatial distribution of the number of trips for each time frame.
Before we map all four time frames, however, it is important to consider that the ‘style’ argument of tm_fill() can take on many values (pretty, quantile, equal, etc.). In order to pick an appropriate ‘style’, we can pick one time frame and plot it using three different styles to determine the best way to depict the distribution.
tmap_mode('view')
### Weekday morning peak 6am - 9am
# Quantile style
weekday_morning_quantile <- tm_shape(hex_passenger_wd_69) +
tm_fill(
col = "TOTAL_TRIP",
palette = "Blues",
style = "quantile",
title = "Number of Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.7,
popup.vars = c(
"Number of Trips: " = "TOTAL_TRIP"
),
popup.format = list(
TOTAL_TRIP = list(format = "f", digits = 0)
)
) +
tm_borders(col = "grey40", lwd = 0.7)+
tm_basemap(server = c('Esri.WorldGrayCanvas'))+
tm_layout(title = 'Weekday Morning Peak (Quantile)', title.position = c('right', 'top'))
# Jenks style
weekday_morning_jenks <- tm_shape(hex_passenger_wd_69) +
tm_fill(
col = "TOTAL_TRIP",
palette = "Blues",
style = "jenks",
title = "Number of Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.7,
popup.vars = c(
"Number of Trips: " = "TOTAL_TRIP"
),
popup.format = list(
TOTAL_TRIP = list(format = "f", digits = 0)
)
) +
tm_borders(col = "grey40", lwd = 0.7)+
tm_basemap(server = c('Esri.WorldGrayCanvas'))+
tm_layout(title = 'Weekday Morning Peak (Jenks)', title.position = c('right', 'top'))
# Equal style
weekday_morning_equal <- tm_shape(hex_passenger_wd_69) +
tm_fill(
col = "TOTAL_TRIP",
palette = "Blues",
style = "equal",
title = "Number of Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.7,
popup.vars = c(
"Number of Trips: " = "TOTAL_TRIP"
),
popup.format = list(
TOTAL_TRIP = list(format = "f", digits = 0)
)
) +
tm_borders(col = "grey40", lwd = 0.7)+
tm_basemap(server = c('Esri.WorldGrayCanvas'))+
tm_layout(title = 'Weekday Morning Peak (Equal)', title.position = c('right', 'top'))
tmap_arrange(weekday_morning_quantile, weekday_morning_jenks, weekday_morning_equal, asp = 1, ncol = 3)As can be seen, the ‘quantile’ and ‘jenks’ style can better display the difference in distribution of trips between the different hexagons. This can be attributed to the fact that the range of trips between the hexagons are too large. However, the ‘quantile’ function suffers from its final grouping, containing values from roughly 5,539 to 136,490; this resulted int the oversaturation of the high value hexagons. On the other hand, the ‘jenks’ method “divides the features into classes whose boundaries are where there are relatively big differences in the data values” (Reference). Therefore, it is possible to move forward using the ‘jenks’ style for visualization, but by adjusting the breaks to provide more stratification in the hexagon colors for better visualization..
It is good to remove the 3 plots created to plot the different styles since they will not be used and will only take up memory.
rm(list=c('weekday_morning_equal','weekday_morning_quantile','weekday_morning_jenks'))The functions of the tmap package will be used to create the maps.
tmap_mode('view')
# Weekday morning peak 6am - 9am
weekday_morning <- tm_shape(hex_passenger_wd_69) +
tm_fill(
col = "TOTAL_TRIP",
palette = "Blues",
style = "jenks",
title = "Number of Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.7,
popup.vars = c(
"Number of Trips: " = "TOTAL_TRIP"
),
popup.format = list(
TOTAL_TRIP = list(format = "f", digits = 0)
)
) +
tm_borders(col = "grey40", lwd = 0.7)+
tm_basemap(server = c('Esri.WorldGrayCanvas'))+
tm_layout(title = 'Weekday Morning Peak', title.position = c('right', 'top'))
# Weekday afternoon peak 5pm - 8pm (17:00 - 20:00)
weekday_afternoon <- tm_shape(hex_passenger_wd_1720) +
tm_fill(
col = "TOTAL_TRIP",
palette = "Blues",
style = "jenks",
title = "Number of Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.7,
popup.vars = c(
"Number of Trips: " = "TOTAL_TRIP"
),
popup.format = list(
TOTAL_TRIP = list(format = "f", digits = 0)
)
) +
tm_borders(col = "grey40", lwd = 0.7)+
tm_basemap(server = c('Esri.WorldGrayCanvas'))+
tm_layout(title = 'Weekday Afternoon Peak', title.position = c('right', 'top'))
# Weekend/holiday morning peak 11am - 2pm (11:00 - 14:00)
weekend_noon <- tm_shape(hex_passenger_weh_1114) +
tm_fill(
col = "TOTAL_TRIP",
palette = "Blues",
style = "jenks",
title = "Number of Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.7,
popup.vars = c(
"Number of Trips: " = "TOTAL_TRIP"
),
popup.format = list(
TOTAL_TRIP = list(format = "f", digits = 0)
)
) +
tm_borders(col = "grey40", lwd = 0.7)+
tm_basemap(server = c('Esri.WorldGrayCanvas'))+
tm_layout(title = 'Weekend/holiday Morning Peak', title.position = c('right', 'top'))
# Weekend/holiday evening peak 4pm - 7pm (16:00 - 19:00)
weekend_evening <- tm_shape(hex_passenger_weh_1619) +
tm_fill(
col = "TOTAL_TRIP",
palette = "Blues",
style = "jenks",
title = "Number of Trips",
id = "grid_id",
showNA = FALSE,
alpha = 0.7,
popup.vars = c(
"Number of Trips: " = "TOTAL_TRIP"
),
popup.format = list(
TOTAL_TRIP = list(format = "f", digits = 0)
)
) +
tm_borders(col = "grey40", lwd = 0.7)+
tm_basemap(server = c('Esri.WorldGrayCanvas'))+
tm_layout(title = 'Weekend/holiday Evening Peak', title.position = c('right', 'top'))
tmap_arrange(weekday_morning, weekday_afternoon, weekend_noon, weekend_evening, nrow = 2, ncol = 2)Search for the bus stop near your place and compare the numbers between the four maps by zooming in! Do you think it’s accurate?
From the rough visualization, it’s clear that not all bus stops experience a similar level of traffic throughout different timing of the days. Additionally, based on the quantiles created by tmap, it seems that the ranges of passenger traffic are radically different between the four time windows. For example, certain grids during Weekday Morning Peak, a hexagon could reach 328,545 passenger trips, whereas the highest number during Weekend/holiday Morning Peak only reaches 112,330. It appears that Weekend Evening Peak 17:00 - 20:00 is the period with the highest level of activity.
An important observation seems to be that there darker shaded hexagons tend to be clustered, indicating a positive spatial autocorrelation. However, there are clearly hexagons which are much darker than its surrounding tiles (such as the one near Tampines), indicating negative spatial autocorrelation. By conducting LISA analysis, it will be possible to determine the level of spatial autocorrelation for each hexagon, visualize them, as well as to depict the relationship between a hexagon and its neighbors through a LISA cluster map.
Local Indicators of Spatial Autocorrelation (LISA)
Defining the Neighborhood
Before the LISA analysis can be conducted, it is important to define our neighborhood, or the neighbors of each polygon. This is based on the ideas that neighbors, or spatial objects which are related to other spatial objects based on sharing a common boundary or lying with a certain distance of one another, might affect each other.
In this case, we would like to identify the neighbors of each hexagon so that LISA analysis can determine if the number of trips in a hexagon in a time frame is correlated to the number of trips of the hexagons around it, either in the same direction or opposite direction.
The first step to determining the neighborhood is to choose a method by which neighbors are classified:
- Contiguity-based Method: Based on the sharing of boundaries, either edges and/or points (Queen and Rook method).
- Distance-based Method: Based on the distances between the centroid (central point of each hexagon) of each polygon. This can either be set to a distance where each hexagon has at least one neighbor (Fixed Distance) or where each polygon has a certain number of neighbors (Adaptive Distance).
It is important to choose the appropriate method according to each situation. However, in this study, Contiguity-based methods can be preemptively ruled out due to the fact that some cells have no contiguous neighbors, as can be seen below.

If the contiguity method is used, the LISA calculation for these cells would not be conducive for analysis as they technically have no neighbors. This can be confirmed by using the st_contiguity() function to create a Queen contiguity matrix for one time frame
wm_q <- st_contiguity(st_geometry(hex_passenger_wd_1720))
summary(wm_q)Neighbour list object:
Number of regions: 1520
Number of nonzero links: 6874
Percentage nonzero weights: 0.2975242
Average number of links: 4.522368
9 regions with no links:
561 726 980 1047 1415 1505 1508 1512 1520
Link number distribution:
0 1 2 3 4 5 6
9 39 109 205 291 364 503
39 least connected regions:
1 7 22 38 98 169 187 195 211 218 258 259 264 267 287 454 562 607 642 696 708 732 751 784 869 1021 1022 1046 1086 1214 1464 1471 1482 1500 1501 1503 1506 1510 1519 with 1 link
503 most connected regions:
10 13 16 17 24 25 31 35 42 43 48 53 55 60 63 67 73 77 80 81 84 85 87 88 91 92 97 102 107 111 117 121 127 133 140 141 143 148 149 150 154 155 156 157 163 164 165 173 174 175 183 184 185 191 192 193 194 200 201 202 205 206 207 208 216 229 239 243 244 246 257 266 271 278 279 283 284 291 292 298 300 301 302 304 309 310 312 313 316 321 324 325 327 337 338 339 340 343 352 355 363 368 390 391 400 402 403 407 414 418 423 425 431 436 437 438 440 443 450 451 452 461 466 467 468 469 473 477 480 481 485 489 493 494 496 502 503 507 513 514 517 518 523 529 534 539 543 548 549 550 552 556 558 564 568 573 574 576 577 581 590 591 594 598 599 604 605 609 615 619 624 626 633 636 637 638 648 649 650 654 655 657 658 659 669 670 671 677 680 681 682 687 688 690 691 700 701 704 705 706 713 716 717 724 727 728 729 740 741 755 757 758 760 771 774 775 776 777 782 783 787 788 789 792 793 794 795 799 800 806 807 810 811 812 813 819 820 823 824 825 830 831 832 841 843 844 846 847 848 850 851 852 853 854 860 863 865 866 867 871 872 876 877 878 880 881 882 884 885 887 888 891 893 896 899 902 905 906 910 914 919 921 926 927 928 930 931 935 937 943 944 945 946 947 948 954 958 959 962 963 968 969 971 972 973 977 984 985 986 987 988 990 996 997 998 999 1004 1011 1012 1013 1014 1024 1025 1026 1028 1029 1036 1037 1038 1042 1050 1051 1054 1056 1057 1062 1063 1064 1066 1067 1068 1069 1076 1078 1079 1080 1083 1089 1093 1100 1101 1102 1105 1106 1110 1111 1117 1120 1121 1122 1128 1133 1134 1135 1136 1141 1142 1144 1145 1146 1147 1148 1150 1156 1157 1158 1162 1163 1164 1166 1169 1170 1171 1172 1176 1177 1178 1179 1180 1184 1186 1190 1191 1192 1193 1194 1201 1202 1203 1204 1205 1206 1207 1210 1211 1217 1218 1219 1220 1221 1227 1233 1234 1235 1239 1244 1245 1251 1253 1254 1255 1261 1265 1266 1271 1272 1273 1277 1281 1283 1289 1299 1301 1302 1303 1304 1316 1318 1324 1325 1326 1327 1329 1330 1331 1334 1335 1336 1337 1343 1344 1345 1352 1353 1355 1356 1361 1365 1366 1368 1369 1371 1372 1377 1380 1381 1382 1384 1388 1391 1393 1395 1398 1406 1408 1412 1417 1418 1420 1424 1425 1426 1427 1428 1433 1434 1435 1436 1440 1441 1442 1446 1447 1449 1451 1453 1456 1457 1459 1460 1461 1462 1469 with 6 links
As can be seen from the weight matrix, 10 hexagon cells have 0 neighbor. Therefore, a Distance-based Method would be suitable for analysis. Both Fixed Distance and Adaptive Distance Weight Matrix can be created for comparison to find the most appropriate method.
Fixed Distance Weight Matrix
Creating the Distance Matrix
st_dist_band() of sfdep is incredibly powerful in that it can create a neighbor list based on distance between the centroid of polygons and a lower and upper bound distance to other centroid. The default arguments for st_dist_band() will find neighbors with a lower and upper bound so that each hexagon will have at least one neighbor. This is the equivalent of the steps in spdep of the function knearneigh() of k=1.
st_inverse_distance() of a sfdep can be combined with st_dist_band() in a dplyr step to create a new column in each data frame of the different time frames to create a neighbor list and a inverse distance weight list. Additionally, st_lag() can be used to create a spatially lagged value column for total trips based on the weight of neighbors.
# Weekday morning peak 6am - 9am
wd69_nb <- hex_passenger_wd_69 %>%
mutate(nb = st_dist_band(area_honeycomb_grid),
wt = st_inverse_distance(nb, area_honeycomb_grid),
lag_trip = st_lag(TOTAL_TRIP,nb,wt),
.before = 1) # to put them in the front
# Weekday afternoon peak 5pm - 8pm (17:00 - 20:00)
wd1720_nb <- hex_passenger_wd_1720 %>%
mutate(nb = st_dist_band(area_honeycomb_grid),
wt = st_inverse_distance(nb, area_honeycomb_grid),
lag_trip = st_lag(TOTAL_TRIP,nb,wt),
.before = 1) # to put them in the front
# Weekend/holiday morning peak 11am - 2pm (11:00 - 14:00)
weh1114_nb <- hex_passenger_weh_1114 %>%
mutate(nb = st_dist_band(area_honeycomb_grid),
wt = st_inverse_distance(nb, area_honeycomb_grid),
lag_trip = st_lag(TOTAL_TRIP,nb,wt),
.before = 1) # to put them in the front
# Weekend/holiday evening peak 4pm - 7pm (16:00 - 19:00)
weh1619_nb <- hex_passenger_weh_1619 %>%
mutate(nb = st_dist_band(area_honeycomb_grid),
wt = st_inverse_distance(nb, area_honeycomb_grid),
lag_trip = st_lag(TOTAL_TRIP,nb,wt),
.before = 1) # to put them in the frontSince the hexagon tiles is stable across all time frame, it is possible to plot the neighbor relationship for only one time frame in order to visualize the neighbors list created. Let’s take Weekday morning peak 6am - 9am. By visualizing, the appropriateness of the Fixed Distance Method can be roughly determined.
plot(wd69_nb$area_honeycomb_grid, border = 'lightgrey')
plot(st_knn(wd69_nb$area_honeycomb_grid, k = 1), st_centroid(wd69_nb$area_honeycomb_grid), add=TRUE, col = 'red', length = 0.08)
Between the Fixed Distance and Adaptive Distance Matrix, an Adaptive Distance Matrix would be more appropriate to the non-uniform nature of bus stop spatial distribution across the map. Using Adaptive Distance would allow for the specification of the number of neighbors, allowing for the standardisation of the LISA analysis process across hexagons.
Calculating LISA
local_moran() of the sfdep package can be used to calculate Local Moran’s I Statistic and other related statistics.
# Weekday morning peak 6am - 9am
wd69_lisa <- wd69_nb %>%
mutate(local_moran = local_moran(TOTAL_TRIP, nb, wt, nsim = 199),
.before = 1) %>%
unnest(local_moran)
# Weekday afternoon peak 5pm - 8pm (17:00 - 20:00)
wd1720_lisa <- wd1720_nb %>%
mutate(local_moran = local_moran(TOTAL_TRIP, nb, wt, nsim = 199),
.before = 1) %>%
unnest(local_moran)
# Weekend/holiday morning peak 11am - 2pm (11:00 - 14:00)
weh1114_lisa <- weh1114_nb %>%
mutate(local_moran = local_moran(TOTAL_TRIP, nb, wt, nsim = 199),
.before = 1) %>%
unnest(local_moran)
# Weekend/holiday evening peak 4pm - 7pm (16:00 - 19:00)
weh1619_lisa <- weh1619_nb %>%
mutate(local_moran = local_moran(TOTAL_TRIP, nb, wt, nsim = 199),
.before = 1) %>%
unnest(local_moran)tmap_mode('view')
tm_shape(wd1720_lisa)+
tm_fill(col = 'mean',
palette = "RdBu",
style = "cat",
title = "ii"
) +
tm_borders(col = "grey40", lwd = 0.7)+
tm_basemap(server = c('Esri.WorldGrayCanvas'))